UTX 1.11, a Simple and Open User Dictionary/Terminology Standard, and its Effectiveness with Multiple MT Systems

نویسندگان

  • Seiji Okura
  • Yuji Yamamoto
  • Hajime Ito
  • Michael Kato
  • Miwako Shimazu
  • Francis Bond
چکیده

We have formulated a dictionary/glossary format UTX 1.11 and released it in May 2011. UTX 1.11 is a simple format that is friendly to both computers and humans. UTX dictionaries can be used not only as machinereadable dictionaries for rule-based machine translation (MT) systems, but also for computer-aided translation by human translators. The initial objective of UTXSimple 1.00, released in 2008, was to improve the accuracy of various MT systems by specifying a common format. A key feature of its latest version UTX 1.11 is a term management mechanism by introducing four term statuses ("provisional," "forbidden," "approved," and "non-standard"). We show that a UTX 1.11-based dictionary originally created as a glossary is highly effective for improving the accuracy of MT. UTX can be widely and successfully applied in various fields with specialized terminology, such as localization, open source, education, administration, medicine, and law.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Post-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary

A statistical machine translation (SMT) system requires homogeneous training data in order to get domain-sensitive (or context-sensitive) terminology translations. If the data consists of various domains, it is difficult for an SMT system to learn context-sensitive terminology mappings probabilistically. Yet, terminology translation accuracy is an important issue for MT users. This paper explor...

متن کامل

Term formation as the object of analysis of various terminology systems (on the basis of analysis of aerospace terminology in Russian language)

This article is dedicated to the study of the method of various term system analysis from term formation perspective. Herewith as the simple of analysis is studied aerospace terminology in Russian language. The main ways of term formation are divided into four groups: synthetic way, adoption, semantic metaphorization, analytic way. Each way and the nuances of its analysis are explained in detai...

متن کامل

Portable Knowledge Sources for Machine Translation

in this paper, we describe the acquisition iuld (Irga-nization of knowledge sources fur machine translation (MT) systems. It has heen liointed out by many users that one of the most annoying things idmtlt MT sys-terns is tim repeated occurrence of identical errors in word sense and attachment dlsambiguation. We show the limitations of a conventional user-dictionary method and explain how our ap...

متن کامل

Measuring the effectiveness of human resource information systems in national iranian oil company an empirical assessment

While the growth of MIS investment and its influence is making MIS evaluation ever more indispensable, little attention has been paid to assessing and communicating system effectiveness. This paper attempts to empirically assess the effectiveness of integrated human resource information system in Iranian oil industry. As suggested by recent research, the widely accepted IS success model is...

متن کامل

Systran Mt Dictionary Development

SYSTRAN has demonstrated success in the MT field with its long history spanning nearly 30 years. As a general-purpose fully automatic MT system, SYSTRAN employs a transfer approach. Among its several components, large, carefully encoded, high-quality dictionaries are critical to SYSTRAN's translation capability. A total of over 2.4 million words and expressions are now encoded in the dictionari...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011